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Nonparametric Reconstruction of the Dark Energy Equation of State 
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A basic aim of ongoing and upcoming cosmological surveys is to unravel the mystery of dark 
energy. In the absence of a compelling theory to test, a natural approach is to better characterize 
the properties of dark energy in search of clues that can lead to a more fundamental understanding. 
One way to view this characterization is the improved determination of the redshift-dependence of 
the dark energy equation of state parameter, w{z). To do this requires a robust and bias- free method 
for reconstructing w^z) from data that does not rely on restrictive expansion schemes or assumed 
functional forms for w{z). We present a new nonparametric reconstruction method that solves for 
w{z) as a statistical inverse problem, based on a Gaussian Process representation. This method 
reliably captures nontrivial behavior of 'w{z) and provides controlled error bounds. We demonstrate 
the power of the method on different sets of simulated supernova data; the approach can be easily 
extended to include diverse cosmological probes. 

PACS numbers: 98.80.-k, 95.36.+X 



I. INTRODUCTION 

The discovery of the accelerated expansion of the Uni- 
verse P, Q poses perhaps the greatest puzzle in funda- 
mental physics today. A solution of this problem will pro- 
foundly impact cosmology and could also provide key in- 
sights in reconciling gravity with quantum theory. Driven 
by these motivations, the fundamental aim of ground 
and space based missions such as the Baryon Oscillation 
Spectroscopic Survey (BOSS) [3|], the Dark Energy Sur- 
vey [1], the Joint Dark Energy Mission (JDEM) @, the 
Large Synoptic Survey Telescope (LSST) [6^ - to name 
just a few - is to unravel the secret of cosmic accelera- 
tion. In search of the underlying explanation, theoretical 
approaches fall into two main categories: (i) dark energy 
- invoking a new cosmic ingredient, the simplest being 
a cosmological constant, and (ii) modified gravity - in- 
voking new dynamics of space-time (for a recent review, 
see Ref. Q)- In this paper we consider only the dark 
energy alternative, and, for the moment, ignore possible 
modifications of general relativity. 

A fundamental difficulty in dark energy investigations 
is the absence of any single compelling theory to test 
against observations. Consequently, much of the work 
in this area has followed the approach to parameterize 
dark energy by its equation of state w = p/ p (where p is 
the pressure, and p the density), see, e.g., Ref. dy- 
namical models of dark energy such as quintessence fields 
lead to a time- varying equation of state 'oj. Data anal- 
ysis efforts therefore focus on characterizing this time- 
dependence. Current observations are consistent with 
the existence of a cosmological constant. A, {w = —1), 
at the 10% level, the time- variation being unconstrained 
(for recent constraints on w, see e.g. Refs. [13, Ell)- The 
implied value of A is, however, in utter disagreement with 



simple theoretical estimates of the vacuum energy, being 
too small by a factor > 10^". It is therefore an ad hoc 
addition with no hint of a possible origin, hence the fo- 
cus on dynamical explanations, e.g., field theory models 
or modified gravity. Although a detection of any time 
or, equivalently, redshift-dependence in w{z) would im- 
mediately rule out a cosmological constant, such obser- 
vational imprints must necessarily be subtle, otherwise 
they would have been discovered already. This is the 
motivation behind constructing a robust framework with 
controlled error bounds that allows a reliable extraction 
of w{z) from diverse datasets. 

Shortly after the discovery of the accelerated expan- 
sion, it was pointed out that a reconstruction program 
(an inverse analysis of data) for dark energy working 
directly with observational supernova data is computa- 
tionally possible (see, e.g., Refs. [l^ [l^ for early ap- 
proaches). Soon a large number of papers followed, sug- 
gesting many different ways of reconstructing diverse 
properties of dark energy, e.g. Refs. [l3 - |20j : a review 
on dark energy reconstruction methods including a com- 
prehensive list of references is given in Ref. [2l| . Broadly 
speaking, reconstruction techniques fall into two classes, 
the first being those based on parameterized forms for 
w{z) such a.s w = const., w = wq + w'z [23 - |23 | or 
w = wo — Waz/{l + z) m,!!^. These possess the virtue of 
simplicity but can have serious shortcomings due to lack 
of generality and error control (specifically issues of bias, 
see, e.g., [23]), especially as one goes to higher redshifts. 

The second class consists of nonparametric methods 
that aim to solve the inverse problem of determining 
the actual function wlz) given observational data, rather 
than just the parameters specifying some assumed form 
of w{z). The hope is to avoid the possible biasing of re- 
sults due to specific assumptions regarding the functional 
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form of w{z), which may turn out to be incorrect. The 
difficulty with direct reconstruction methods as apphed 
to supernova data is that extracting the desired informa- 
tion formally involves taking a second derivative of the 
- unavoidably noisy - luminosity distance-redshift rela- 
tion, and the robustness and error control of the resulting 
reconstruction can therefore be suspect. 

A separate alternative to the direct reconstruction ap- 
proach for w from the data, is to falsify classes of dark 
energy models. For example, in Ref. [28| different gen- 
eral forms for w are considered that capture different 
dynamical dark energy models. A hypothesis test is then 
carried out for these models to determine how likely they 
are given current data. In the best case scenario, entire 
classes of models can be excluded in this way. In Ref. [l^ 
classes of dark energy models are falsified by carrying 
out a combined analysis of the growth of structure and 
the expansion history of the Universe from cosmic mi- 
crowave background (CMB) and supernova data. This 
approach takes advantage of the fact that a viable dark 
energy model must be consistent with measurements of 
both of these relatively orthogonal probes of dark en- 
ergy. As pointed out in Ref. [20] the falsification of the 
smooth dark energy class would be very interesting, and 
a different paradigm for explaining the accelerated ex- 
pansion such as a modification of gravity on very large 
scales would be required. Hypothesis testing therefore 
provides an interesting alternative to the direct recon- 
struction approach. In fact, in order to convincingly ex- 
clude a cosmological constant from future measurements, 
both approaches should be employed, with the aim of ar- 
riving at a consistent conclusion. 

Given finite data sets, there are - broadly speaking - 
two ways in which one can go wrong in the reconstruc- 
tion task, (i) errors due to the assumption of the wrong 
shape of w{z), as discussed above and (ii) errors due to 
the complex nature of the high-dimensional space within 
which the inverse problem is being attempted, in partic- 
ular, problems due to the existence of degeneracy direc- 
tions. In this paper, our aim is to address the first of 
these problems, i.e., to develop a technique that is suf- 
ficiently flexible, yet not dangerously susceptible to new 
error sources as a result of the extra degrees of freedom. 
The second aspect of the inverse problem, the difficulty 
of dealing with degeneracy directions (as seen in the ex- 
amples below), is not directly addressed here. This issue 
requires sensitivity analyses and a formalism for incor- 
porating multiple data sources and will be treated else- 
where [30|. 

In the current paper, we propose a new, nonparametric 
reconstruction approach that solves the associated statis- 
tical inverse problem by sampling the posterior distribu- 
tion using Markov Chain Monte Carlo (MCMC) meth- 
ods, while representing w{z) by a Gaussian Process (GP). 
Traditionally, GP modeling is a nonparametric regres- 
sion approach based on a generalization of the Gaussian 
probability distribution. It extends the notion of a Gaus- 
sian distribution over scalar or vector random variables to 



function spaces. While a Gaussian distribution is speci- 
fied by a scalar mean /i or a mean vector and a covariance 
matrix, the GP is specified by a mean function and a co- 
variance function [sil, [s^]- CPs have been successfully 
applied in astrophysics and cosmology to construct pre- 
diction schemes for the dark matter power spectrum and 
the CMB temperature angular power spectrum [33l - [36| . 
to model asteroseismic dat a [3 Z], and to derive photo- 
metric redshift predictions [38l l39j . Here we will use the 
GP modeling approach - in concert with MCMC ~ to 
reconstruct w{z) from supernova observations, and not 
as a data interpolation or regression tool applied directly 
to observational or computed data, as is most often the 
case. 

As of now, supernova datasets hold by far the most 
information about possible time dependence of wlz), 
though baryon acoustic oscillation (BAO) and CMB mea- 
surements contain complementary information (see, e.g., 
Ref. [l^l for a recent combined reconstruction analysis). 
Although the GP approach can be easily extended to ac- 
commodate more than one observational probe, for clar- 
ity we will restrict ourselves in this paper to supernova 
measurements only. A more inclusive methodology will 
be presented in future work (3(| . 

Since current data quality does not allow placement 
of strong constraints on a possible redshift dependence 
of w{z), we create a set of simulated data of JDEM-like 
quality to demonstrate and test our new method. We 
consider three models, one with a constant equation of 
state and two with varying w{z). Our new approach will 
be shown to perform well in capturing nontrivial devia- 
tions from a constant equation of state and in providing 
reliable error bounds. 

The paper is organized as follows. In Section |ll] we 
provide a brief overview of how supernova data are used 
to constrain the equation of state of dark energy. We 
describe the simulated data sets and their error proper- 
ties in Section IIIII In Section IIVI we introduce differ- 
ent reconstruction methods and present our approach in 
the same section, contrasting our nonparametric method 
with results obtained using the popular parametric forms 
of Refs. m [11]. We conclude in Section El Details of 
the implementation of the GP-based MCMC algorithm 
are given in an Appendix. 

II. MEASURING EXPANSION HISTORY WITH 
SUPERNOVAE 

Type la supernova measurements are currently the sin- 
gle best source of information regarding possible devia- 
tions of w{z) from a constant value. The luminosity dis- 
tance as measured by supernovae is directly connected 
to the expansion history of the Universe described by the 
Hubble parameter H{z). For a spatially flat Universe, the 
relation is given by 

dL{z)^{l + z)— / — , (1) 
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where c is the speed of hght, Hq, the current value of the 
Hubble parameter {H{z) = a/a, where a is the scale fac- 
tor and the overdot represents a derivative with respect 
to cosmic time), and h{z) = H{z)/Hq. The assump- 
tion of spatial flatness is in effect an "inflation prior", 
although there do exist strong constraints on spatial flat- 
ness when CMB and BAO observations are combined 
(see, e.g., Ref. [4l|). In principle, we can relax this as- 
sumption, but enforce it here to simplify the analysis. 

Instead of (ii(z), supernova data are usually specified 
in terms of the distance modulus /j, as a function of red- 
shift. The relation between /i and the luminosity distance 
is 



fisiz) = TUB - Mb = 51oj 



5 log 
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-25 (2) 
51ogio(-ffo)+25, 



where we used Eqn. ([T|). Mb is the absolute magnitude 
of the object and ttib the (B-band) apparent magnitude. 
Writing out the expression for the Hubble parameter h{z) 
in Eqn. ([2]) explicitly in terms of a general dark energy 
equation of state for a spatially flat FRW Universe leads 
to the relation 



fiBiz) = 25 - 51ogio(i/o) 
-H51ogio((f + z)c /" ds[Q,n{l + sf 



(3) 
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Note that Hq cannot be determined from supernova mea- 
surements in the absence of an independent distance mea- 
surement. Thus Hq can be treated as unknown and ab- 
sorbed in a re-definition of the absolute magnitude: 



Mb = Mb - 5log^QHQ + 25, 



(4) 



which accounts for the combined uncertainty in the ab- 
solute calibration of the supernova data, as well as in 
Hq. Using this, the B-band magnitude can be expressed 
as TTiB = 5logiQDL{z) +Mb where DLiz) = HqcLl^z) 
is the "Hubble-constant-free" luminosity distance. The 
measurement of ^b is only a relative measurement and 
M B allows for an additive uncertainty which can be left 
as a nuisance parameter. To simplify our notation, we ab- 
sorb 51ogio(^^o) — 25 into our definition of the distance 
modulus, leading to: 

Ms =Ms + 51ogio(i^o)-25 = 51ogio[Di(z)]. (5) 

With this definition of the distance modulus we have cal- 
ibrated the overall off-set of the data to be zero. To 
account for uncertainties in the calibration, we introduce 
a shift parameter with a broad uniform prior. 

Given a set of observations for ^ib{z) with associated 
errors, the task at hand is to solve the statistical inverse 



problem, i.e., to extract the corresponding w{z) by in- 
verting the stochastic version of Eqn. ([3]), i.e., inverting 
a nonlinear smoothing operator, which can be viewed for- 
mally as requiring taking two derivatives of the (noisy) 
data, the key difficulty to be overcome in reconstruction. 

As previously stated, the present quality of supernova 
data is not good enough to determine the equation of 
state beyond a cosniological constant (i.e., use of the pa- 
rameterized form w — const.). To do better than this, 
both systematic and statistical errors need to be brought 
under further control. Such systematic errors can oc- 
cur due to, e.g., uncertainties in luminosity corrections 
and therefore in distance estimates, or the fitting pro- 
cedure for the supernova light curves; for a recent dis- 
cussion of these issues, see Ref. [12]. Larger numbers of 
supernovae, especially at high redshifts, are needed to 
get firm constraints on a possible variation in w (see, 
e.g., Refs. ^,535). Future supernova surveys, especially 
space-based, hold the promise to remedy this situation. 
We therefore explain our method for reconstructing 'w{z) 
with simulated data that mimics the expected quality of 
future space-based observations. We turn now to a de- 
scription of the simulated datasets. 



III. SYNTHETIC DATASETS 

In this section we introduce three synthetic datasets 
which we will use to compare the GP approach to pa- 
rameterized methods for estimating w{z). Synthetic 
datasets have three important attributes: (i) The un- 
derlying "truth" is known and one can therefore impose 
a quantitative measure on how well each method per- 
forms, (ii) The data quality can be controlled, e.g., we 
mimic the expected data quality from future space-based 
supernova surveys, (iii) Dark energy models with very 
different equations of state w{z) can be synthesized. 




FIG. 1: Redshift distribution of supernovae from the three 
simulated datasets investigated. In addition to JDEM mea- 
surements of supernovae, we assume a low redshift sample of 
300 supernovae for z < 0.1. The bin width is Az = 0.1. 
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FIG. 2: Three simulated datasets. The upper row shows A/is (the data itself with the corresponding value for a ACDM model 
subtracted, red crosses) as a function of redshift, z, and the exact A/is for the corresponding model (blue line). The lower 
panels show the behavior of the equation of state wl^z) as a function of redshift. The underlying model for the first dataset is 
a cosmological constant. The second and third datasets are based on quintessence models. The third dataset has been chosen 
to test our reconstruction method on a nontrivial equation of state. 



We assume the measurement of n ~ 2300 supernovae, 
distributed over a redshift range of < z < 1.7 with 
larger concentration of supernovae in the mid-range red- 
shift bins (0.4 < z < 1.1) and at low redshift (z < 0.1). 
Figure [T] shows the detailed distribution of the supernova 
data with respect to redshift. To create the simulated 
data, we begin with points for flB shifted off-center ac- 
cording to some error model for the distance modulus 
(Gaussian variance). The distance modulus error can be 
related to that in Dl by differentiating Eqn. ([5|) to yield 
Sfi — (5/ln 10)(5D^/i?L). For each supernova, we pro- 
vide a measurement for the distance modulus fli and we 
assume a statistical error of tv, = 0.13, as expected from 
future surveys such as JDEM [4J] . For our purposes here, 
it is sufficent to use a simplified error model where the 
errors are the same for all supernovae and independent of 
redshift. We also do not explicitly introduce systematic 
errors. We represent the measured points in the following 
form: 



fit = a{zi) + ti 



(6) 



In this notation, the observations jii follow a normal dis- 
tribution with mean a(zi), the standard deviation being 
set by the distribution of the error, e^, representing a 
mean-zero normal distribution with standard deviation, 
TiCT. Here, is the observed error and a accounts for a 
possible rescaling. In addition, we assume that the errors 
are independent. The assumption of normal distributed 



errors in magnitude space is consistent with the error dis- 
tribution of real observations as quoted in Ref . [l3| • For 
each of the datasets we choose ri,„ = 0.27. The three 
simulated datasets and corresponding equations of state 
are shown in Figure [2] 

Dataset 1: The first dataset is that for a cosmological 
constant with a constant equation of state, w — —1. 

Dataset 2: The second dataset is based on a 
quintessence model with a minimally coupled scalar field. 
The equation of motion for the homogeneous mean field 
is <j)+3H<j)+dV/d(p = 0. The equation of state parameter 
is given by 



- V{^) 
02/2 + ^(0)' 



(7) 



The particular choice of potential used here is V{4>) = 
Vo0^. This model predicts a relatively small variation in 
the equation of state as a function of z as can be seen in 
the middle panel in the lower row in Figure [2] 

Dataset 3: The last dataset is based on a quintessence 
model described in Ref. [4^. This model has a dark 
energy equation of state of the form 



w{z) ~ wq + (m 



^^^i±oM^,Hi+^ (8) 



l-exp(A,-^) 
exp(Ar') + exp{At\l + z*)"!) 



exp(Ar'(l 



exp(A,-i(l + zt)-i) 
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FIG. 3: Simulated data for Model 1 (w = —1) with error 
bars in red. The green and blue line show the exact distance 
modulus /is for Model 2 and 3. Note the very small size of 
the difference. 

with the constants having the values Wq — —1.0, Wm — 
-0.5, zt = 0.5, At = 0.05. This model has w > -1 
everywhere, therefore it can in principle be realized by a 
quintessence field. The time variability of the equation 
of state has an S-shaped form as shown in the right lower 
panel in Figure [5] The parameter choices for this model 
lead to a steeper transition in w{z) from w = —1 to 
w = —0.5 than natural for most quintessence models. 
Therefore, compared to dataset 2, this scenario is less 
realistic. Our choice of this dataset is dictated by the 
fact that it cannot be easily fit by any of the currently 
used parametric reconstruction methods. (It represents 
a general class of models with equations of state that can 
exhibit rapid changes.) 

Figure [3] and the upper panels in Figure [5] give a vi- 
sual impression of the difficulties posed by reconstruc- 
tion. Figure [3] shows the simulated data for Model 1 
with error bars and the exact jiB for Models 2 and 3, 
which are hardly distinguishable by eye. Figure [5] shows 
the differences A/is for each dataset with respect to a 
ACDM model with w = —1; models with nontrivial 
w{z) show relatively small deviations from the horizontal 
line. As we demonstrate below, inverse modeling using 
Gaussian processes can successfully discriminate between 
these marginal differences and reconstruct the dark en- 
ergy equation of state reliably within stated errors. 



IV. RECONSTRUCTION OF THE DARK 
ENERGY EQUATION OF STATE 

As discussed previously, the dark energy equation of 
state is not directly measurable from the luminosity 
distance- redshift relation, given in Eqn. The obvious 
idea of first fitting for ^b{z) and then extracting w{z) by 
taking two derivatives must deal with the noise in the 



data and the filtering required to estimate the deriva- 
tives. Experience with inverse problems has shown that 
such approaches can easily yield unsatisfactory results. A 
detailed discussion on the shortcomings of this approach 
can be found in, e.g., Ref. [46j . 

A simpler alternative is to assume a hopefully well- 
motivated parametric form for 'w{z) and then fit for the 
parameters (for an early discussion about the advantages 
of this approach, see, e.g., Ref. [1^). For example, if 
we assume w to be constant, the integral over 'w{z) in 
Eqn. can be solved analytically and the best-fit value 
for w can then be determined from measurements of 
via, e.g., maximum likelihood techniques. Current data 
are in good agreement with a constant w at the 10% level 
(for a recent analysis see Ref. [lO] and references therein 
for earlier results). Going beyond this, a weak redshift 
dependence of w{z) may be assumed. One way to re- 
alize this is a Taylor expansion of 'w{z) in its redshift 
evolution, of the form w = + WaZ, as suggested in 
Refs. |22H24j . However, this parameterization is not well 
suited for z > 1, the regime that holds the most promise 
to distinguish different models of dark energy [26,43. In 
Ref. [11], the form w — wq — WaZ / {1 + z) is suggested as 
a better alternative (also given previously in Ref. j25|). 
This parameterization has several nice features: it is well 
behaved beyond z = 1, it has only two parameters and is 
therefore relatively easy to constrain, and it captures the 
general behavior of different classes of dynamical dark en- 
ergy models. The major disadvantage is that the param- 
eterization will only allow reconstruction of monotonic 
behaviors of w{z). More involved parameterizations have 
been suggested to address this problem; overviews can be 
found in Refs. J, 21]. Although parameter estimation is 
technically much easier than reconstruction^h can have 
shortcomings due to poor control over bias |27| . 

Nonparamctric reconstruction methods have received 
less attention, in part because the current data qual- 
ity does not fully justify the use of sophisticated inverse 
methods. Nevertheless, with future data quality in mind, 
nonparametric techniques can be a powerful alternative 
for extracting information about 'w{z). They can capture 
more complex behavior in 'w{z) and - in principle - can 
prevent the existence of bias due to a restricted param- 
eterization. Early nonparametric approaches involve a 
smoothing procedure for either or related quantities 
at a characteristic smoothing scale, see, e.g. [iSl fisj. 

A somewhat intermediate approach is apiecewise con- 
stant description of w{z) (see, e.g., Ref. [43|) using basis 
functions such as top-hat bins or wavelets ^2^. In the 
extreme case of one bin for the whole data range, this 
method is equivalent to the w — const, parametriza- 
tion. Determining the optimal number of bins informed 
by the data is therefore important though not straightfor- 
ward. Too few bins would erase important information, 
too many bins would enhance noise to (incorrect) infor- 
mation. In Ref. [l6j . four redshift bins were used, while 
Ref. 01 used five redshift bins over a smaller redshift 
range. In order to obtain uncorrelated estimates of the 
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dark energy parameters in the different bins, a principal 
component analysis is carried out first. This method has 
been used recently by the JDEM Figure of Merit Science 
Working Group 49] to assess the performance of JDEM 
with respect to constraining the dark energy equation of 
state. In Ref. [48] a combined analysis of diverse data 
sets has been performed based on this method and found 
no evolution in w{z). In contrast to the piecewise con- 
stant description of the dark energy equation of state, our 
approach represents w{z) by a continuous Gaussian pro- 
cess, the parameters specifying the process - the so-called 
hyperparameters - being completely determined as part 
of the solution of the inverse problem. It is important 
to distinguish the GP hyperparameters from the param- 
eters of a conventional parametric method. The GP ap- 
proach is nonparametric, the hyperparameters specifying 
aspects of the prior distribution in a Bayesian approach 
(such as properties of the allowed classes of functions). 
One advantage of this degree of freedom is that one can 
explicitly use it to test the sensitivity of the posterior 
distribution to assumptions made about the prior, e.g., 
the order of differentiability. Here, we make no binning 
assumptions or assumptions of the discrete properties of 
the GPs, favorable when working with a physical process 
that is assumed to be continuous in nature. 

In this paper we will study the ansatz w = const, and 
the parameterization suggested in Refs. [1^, [2^ as ref- 
erence standards to compare with the GP modeling ap- 
proach. As a simplification, in the first step of our anal- 
ysis, we will assume knowledge of the value of flm and 
assume perfect calibration, i.e. — 0. In the next step, 
we will drop these assumptions and include the param- 
eters as part of the estimation process, as would be the 
case in a more realistic scenario (albeit without directly 
including non-supernova datasets). To provide a context 
for the GP approach we will first present an analysis with 
parameterized models. 



A. Parametric Reconstruction 

In the study of parametric_reconstruction, we follow a 
Bayesian analysis approach . We focus the analysis on 
two of the previously discussed models: w = const. = wq 
and w{z) — wq — Waz/{1 + z) and use MCMC algorithms 
to fit for the model parameters [Fl'l , resulting in posterior 
estimates and probability intervals for flm, and the pa- 
rameters that specify the form of w{z). We have consis- 
tent priors in all of our models (including the GP model 
described in the next section) so the results are readily 
comparable: 



7r(u;o) ^ C/(-25,l), (9) 

TTiWa) - C/(-25,25), (10) 

7r(l]™) ^ N (0.27,0.04^), (11) 

^(A^) ^ C/(-0.5,0.5), (12) 

7r(cr2) cx o--^ (13) 



and the likelihood 

^<..«(^)"=.(4|:(^)1. 

(14) 

where 9 encapsulates the cosmological parameters to be 
constrained, i.e., a subset or all of {wo,Wa,^m}, and 
A^. Here the notation "~" simply means "distributed 
according to" . U is a uniform prior, with the probability 
density function f{x;a,b) = 1/(6 — a) for x € [a,b] and 
otherwise. is a Gaussian (or Normal distributed) 
prior with the probability density function f{x;^,a'^) — 
exp[— (x — /x)^/(2(T^)]/v27r(T^. The squared notation for 
the second parameter in N{p, a^) is used to indicate that 
a is the standard deviation (to prevent possible confu- 
sion with the variance cr^). (The parameters in the U 
distribution do not have this same meaning of mean and 
standard deviation as in the Normal distribution.) For 
each case we study, we confirm that the MCMC chains 
converged by monitoring the trace plots and checking for 
good mixing and stationarity of the posterior distribu- 
tions. 

The prior for f2,„ is informed by the 7-year WMAP 
analysis [52j for a wCDM model combining CMB, BAO, 
and Hq measurement. Since our assumptions on w are 
less strict than w — const, we broaden the prior by a fac- 
tor of two, leading to a Gaussian prior given in Eqn. pT|) . 
As discussed earlier, we also allow for an uncertainty in 
the overall calibration of the supernova data, A^. We 
choose a wide, uniform prior for A^ given in Eqn. (jl2l) . 
We consider two cases in all the analyses presented in 
this paper. In the first case we fix rim to a fiducial value 
and reconstruct w{z). This allows us to focus on biases 
due to assumed parametric forms (parametric models) 
or possible shortcomings due to the ill-posedness of the 
inverse problem (GP methodology). In the second case, 
we let flm be a free variable within the specified prior, 
allowing us to study problems with degeneracies that are 
highlighted when w is a nontrivial function of redshift. 

1. Constant Equation of State 

The simplest extension beyond a cosmological constant 
is to assume that w{z) is redshift independent. In this 
case, Eqn. ([S]) simplifies to 

flBiwo,z) = 51ogio {(H- z)c / ds[nrnil+sf 

JQ 

+ (1 -an)(l+ 5)^(1 +5)3^"°]"'^'}. (15) 

Current data are in good agreement with this assump- 
tion. We will use the ansatz w — const. = as a first 
test in attempting to reconstruct all three datasets. As 
discussed previously, an MCMC algorithm is employed 
with the chain being run about 10,000 times. Conver- 
gence is very quickly attained, within about the first one 
hundred iterations. 
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FIG. 4: Reconstruction results for w for datasets 1-3 (left to right) assuming w = const, and Q.m and fixed at their fiducial 
values. The black dashed curve shows the "truth" and the red curve, the reconstruction results. The dark blue shaded region 
indicates the 68% confidence level, while the light blue shaded region extends it to 95%. The assumption w = const, makes it 
impossible to capture the time dependence in datasets 2 and 3. 
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FIG. 5: Results as in Figure |4l but letting Sim and vary. The result for dataset 1 is very accurate - wq is very close to the 
true values. The predictions for datasets 2 and 3 are poor, not only for wq, but also for the incorrect biasing of Qm (see text). 



Figure [4] shows the results for the case where we fix 
= 0.27 and assume perfect calibration. As expected, 
the reconstruction works extremely well for the model 
where in fact w = const, (left panel). The best fit value 
for Wq and its probability intervals (Pis) are given in 
Table U] and match the chosen value within small errors. 
Not surprisingly, the results for the models with time 
varying w are rather inaccurate. For dataset 2, the value 
for w is predicted slightly higher than the average would 
be. In general, a larger w leads to a lower A/is. As can 
be seen in Figure[21 AflB{z) is slightly below the ACDM 
model for this dataset. The one-parameter best fit for wq 
therefore has to be high in order to capture this behavior, 
if we do not allow any other parameter to vary. For the 
third dataset, we find a similar situation. As can be seen 
in Figure [2] in the right panel, A/ig is below the fiducial 
model. Capturing this behavior with only one parameter 
to vary, wq, leads to a value > — 1 in order to fit the 
behavior in Aflg^z) reasonably well. 

In the next step, we allow Qm and A^ to vary within 
the assumed priors given in Eqs. (ITl]) and (IT^ . The re- 



TABLE I: w = const. - 95% Probability Intervals (Pis) 



Set 



wo 



-1.003tHi^ 
-0.862«:«1} 

u.yio_Q g]^3 

i.UZl_g j^QO 

-0.849+°"«s 



-0.091 
■10(7+0.117 
■-'-"O-0.120 



0.27 
0.27 
0.27 

r, 970+0.022 
\J.Z,I 0_Q Q27 

U.ZOO_o Q3g 
Q47+0.018 






-0.003tHrr 
-0.005lHlt 



'^•^'-o.os 
n 97+0.06 

^•^'-0.05 

n qq+0 06 

'-0.05 
n 07+0 06 

u.yi _o.o5 



-0.006ini^, 0.97+° °^" 



0.05 



suits for w (including the truth) are shown in Figure [5] 
The best fit values including error bars are given in Ta- 
ble m Since flm and wq are highly correlated they must 
be sampled jointly with a covariance structure obtained 
after running the process for some time. As in the case 
of and A^ fixed, the analysis is robust and works 
well for the case oi w = const. Although the error bands 
increase, the best fit values for all three parameters are 
very close to the assumed model values. In the two cases 
of variable w, the strong degeneracy between wq and 
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FIG. 6: Upper row: same as in Figure 14] but with the reconstruction based on the parameterization of w represented by 
Eqn. (|16|l . The parameterization captures the variation in dataset 2 reasonably well, but is still not flexible enough to reconstruct 
an equation of state with less smooth changes - as in dataset 3. The lower panels shows the 68% and 95% confidence contours 
for the fitting parameters wo and Wa in Eqn. (|16|l for the three datasets. Note that the axes of the contour plots have different 
ranges; the uncertainties for dataset 3 are the largest. 



becomes very apparent, as both wq and fi™ influence the 
behavior of /xb in a very similar way: This bias on w 
will disappear if other datasets such as CMB data are 
included to provide good constraints on ilm- For the sec- 
ond dataset, where A/is has a downward trend for higher 
z, the behavior can be captured by a low value for flm 
and a high value for w, or vice-versa. The best fit values 
will also account for the curvature in A/Ib and we find 
that the best-fit model underpredicts fi™ and overpre- 
dicts Wq. This degeneracy can only be broken if we have 
better estimates for $7^- For dataset 3 the situation is 
even more severe: in order to capture the slope of A/is, 
the estimates for both parameters, flm and wq, are off, 
Q.m is highly overestimated, while the value for wq is un- 
derestimated and in fact does not even go through the 
true w{z) any more as can be seen in Figure [S] This 
example demonstrates the bias that can be introduced 
in the reconstruction of w{z) if the assumed form for w 
is too restricted and degeneracies are present. Also note 
that the true result no longer falls within the predicted 
error bands. 

For both datasets, the prediction for A^, which is 
mainly anchored by the amplitude of the measurements 



for fiB, is close to the true value. We also note that the 
"truth" for Vim and A^ is not exact since we are working 
with one finite realization for each dataset. 



2. Wo — Wa Parameterization 

We now turn to the investigation of a commonly used 
parameterization of the dark energy equation of state 
given by Refs. fH,!!]: 

Wiz) = Wo ~ Wa— . (16) 

1 + z 

As for the case w = const., one integral in Eqn. ([3]) can be 
solved analytically and the expression for p,B simplifies 
to: 

P-B{wo,Wa,z) = 

51ogio{(l + z)c / ds[n^{l + sf 
Jo 

+ {1- nm){l + s)3('"o-'"»+i)e3"'»''/(i+'^)J "^''^1 . 

(17) 
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FIG. 7: Results shown as in Figure |6] but allowing Qm and to vary. 



This parameterization allows for a weak monotonic time 
dependence in w and should therefore capture the behav- 
ior of our second model reasonably well. 

Following the analysis in the previous case {w = 
const.), we first fix flm and to their fiducial values. 
The results are summarized in Figure[6]and Table ITll For 
the w = const, dataset the parameterization picks up a 
very small variation in w but the prediction w = — 1 is 
well within errors. The mild variation with z in the sec- 
ond dataset is captured reasonably well. A rather high 
value for wq leads to a pull-down of AfiB. This is then 
compensated by a large positive value for Wa which leads 
to an upturn in Ajlg. 

For the third dataset the parameterization is not quite 
flexible enough. While the overall behavior (the rise at 
high redshift) is captured, the S-shape of the underlying 
equation of state cannot be extracted. Moreover, in an 
attempt to fit the data, the value of equation of state 
today is decreased to wq < —1. This decrease leads to 
an upturn of Ap,B{z) while the large negative value for 
Wa acts in the opposite direction. The parameterization 
finds a time dependence in w, but not of the correct spe- 
cific form as would be required for distinguishing different 
models of dark energy. 

The results including estimations for flm and A^ are 
similar. As for the w = const, parameterization, the 



parameters are all sampled jointly because of their strong 
correlations. These correlations degrade the accuracy of 
the w reconstruction. For the first dataset, the prediction 
for is slightly high which in turn amplifies a time 
dependence in the best fit for w which does not exist in 
the original dataset. Again, the error bars are large and 
clearly w = — 1 is well within the error bounds. For the 
second data set, the prediction for is rather accurate. 
For dataset 3, flm is overpredicted which leads to a slight 
degradation in the prediction for w itself. The values for 
Wo and Wa are similar to the case of fixed flm. 

Overall, the parameterization provides a reasonable 
description of the data, especially for moderately vary- 
ing w, as is expected. The drawback is obvious: rapid 
changes in w as shown in dataset 3 cannot be captured. 



Nonparametric Reconstruction: Gaussian 
Process Model 



The previous exploration of the standard parametric 
methods makes it clear that as long as the data corre- 
spond to the models that the methods are designed for 
(e.g., if w is in fact constant, the ansatz w = wq will 
obviously lead to the best result), the results are rather 
good. However, as soon as datasets are introduced for 
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TABLE II: 


It) = lUO — 


2/(1+2) - 


95% Pis 


Set 


wo 






Am 


1 
2 
3 
1 
2 
3 


-1 002+l!'!;^;i 

-0.826+«'0fg 

— u.uoy 
i.lUO_o.o59 

-0.998+j;'i?t 

n SO7+0.107 
"U.82('_o 

1 177+0.129 

'-0.156 


u.uvjo_o 3g5 

0.203+0^2^ 

l.UOO_o 307 

0.306li:I°^ 

n Qiq+1.305 
— i.UOZ_o 


0.27 
0.27 
0.27 

n 0S1 +0.053 
U.ZO±_o.o77 

r, r,7r, + 0.068 

U.Z(Z_o 094 

0.284tHf7 






U.UUi_o.oi8 
U.UUO_0.oig 
U.UlZ_o 019 


Set 










1 

2 
3 
1 
2 
3 


"■^'-0.06 
n 07+0.06 
'J-^' -0.05 
07+0.06 

-0.05 
n 07+0.06 
"•^'-0.05 
07+0.06 

-0.05 
07+0.06 









which the parameterizations are not flexible enough to 
track the true behavior of w{z), the analysis is suscepti- 
ble to unacceptable levels of bias in determining modeling 
and cosmological parameters. 

It is common practice to employ a parametric form 
for w{z) and then assess the robustness of the result 
by a goodness of fit test. For example, Ref. [2^ uses 
a Bayesian information criterion (BIG) statistic for this 
task. We applied the BIG model comparison criteria for 
the two parametric reconstruction approaches without 
much success - instead of choosing the model that cor- 
responds to the truth (which in case of dataset 2 and 
3 is the wq — Wa parametrization over w = const.) the 
BIG always preferred the parametric form with the least 
parameters, in this case w = const. 

A nonparametric form of w{z) can address some of 
the shortcomings of parametric reconstruction methods. 
We now describe a new, nonparametric method based 
on GP modeling [3l|, Is^]. As mentioned previously, a 
GP is a stochastic process, which in our case is indexed 
by z. The defining property of a GP is that the vector 
that corresponds to the process at any finite collection 
of points follows a multivariate normal (MVN) distri- 
bution. Gaussian processes are elements of an infinite 
dimensional space, this is the sense in which they pro- 
vide a nonparametric method for curve fitting. They are 
characterized by a mean and a covariance function, often 
defined by a small number of hyperparameters. 

We assume that the data errors are Gaussian and use 
the same likelihood as in the treatment of the parameter- 
ized models. The use of Bayesian estimation methods (in- 
cluding the MGMG algorithm) allows us to estimate the 
hyperparameters of the GP correlation function together 
with any other parameters, comprehensively propagat- 
ing all estimation uncertainties 15 1|. Using the definition 
of a GP, we assume that, for any collection 2:1,..., z„, 
w{zi), ...,w{zn) follow a multivariate Gaussian distribu- 
tion with a constant negative mean and exponential co- 
variance function written as 

X(z,z') = (18) 




0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.2 0.4 0.6 0.8 1.0 12 3 4 5 6 

FIG. 8: Example for the priors (red lines) and posteriors 
(black lines) for a^, p and for dataset 1. The parameters 
has a non-informative prior, has an inverse Gamma prior, 
and p has a Beta prior. The posteriors for the three different 
datasets for these parameters are very similar. 

Here (0 e (0,1) is a free parameter that, together with n 
and the parameters defining the likelihood, are fit from 
the data. The form of the assumed correlation function 
implies that, theoretically, there is non-zero correlation 
between any two points, p controls the exponential decay 
of the correlation as a function of distance in redshift, but 
it does not provide a bound for the correlation between 
two points. This is analogous to the concept of standard 
deviation - many, but not all, of the observations are 
within one standard deviation of the mean, and most are 
within two, but there is no theoretical bound that all 
observations have to fall within. In principle, we could 
include an explicit noise term in the correlation (a so- 
called "nugget"). Instead, we chose to include the noise 
term in the likelihood equation from which it will 
propagate to the GP. 

The value of a S (0, 2] infiuences the smoothness of the 
GP realizations: for a — 2, the realizations are smooth 
with infinitely many derivatives, while a = 1 leads to 
rougher realizations suited to modeling continuous non- 
differentiable functions. Here we use a = 1 to allow for 
maximum fiexibility in reconstructing w. (For a com- 
prehensive discussion of different choices for covariance 
functions and their properties, see Ref. [l^l.) The mean 
of the GP is taken to be fixed, p has a prior of Beta{6, 1) 
and has a prior of /G(6, 2). IG is an inverse Gamma 
distribution prior, with the probability density function 
fix;a,P) = /3"a;-"-iF(a)-iexp(-^/a;), with a; > 0. 
The probability distribution of the Beta prior is given 
by/(x;a,/3) =r(a + /3)x"-i(l-x)'5-V[F(a)F(/3)]. Fig- 
ure[5]shows an example of the priors for and the two 
GP model parameters p and k for dataset 1. The poste- 
riors indicate that the data are informative about and 
reasonably informative about p. As for the parametric 
reconstruction, flm is given a prior based on currently 
available estimates. 

We set up the following GP for w: 

w{u) GP{-l,K{u,u')). (19) 

Ghoosing a mean value of -1 is natural, given current 
observational constraints on w. Even though the mean 
is fixed, each GP realization will actually have a differ- 
ent mean with a spread controlled by k. For the second 
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FIG. 10: As in Figure [9] but with Sim and A,j free to vary. The GP reconstruction performs extremely well for all the first two 
cases and captures the third case reasonably well (within error bands). 



and third dataset we adjusted the means during the anal- 
ysis to slightly different values suggested by preliminary 
runs. This adjustment is purely informed by the data and 
demonstrates the flexibility of the approach. In principle, 
the mean could also be left as a free parameter. After the 
adjustment we measured the posterior mean and ensured 
that it was close to the prior mean. Table Hill summarizes 
the prior and posterior means for the final analysis. 

Next, recall that we have to integrate over w{u) 



(Eqn.O: 

yis)=r^du. (20) 

Jo 1 + M 

We use the fact that the integral of a GP is also a GP with 
mean and correlation dependent on the original GP [3l| . 
Therefore y{s) results in a second GP defined as: 



TABLE III: Prior and posterior means for w{u) for the GP 
models. 



Set 



Prior Mean Posterior Mean 



1 (fim, A^) fixed 


-1.00 


-1.01 


1 (rim, A^) free 


-1.00 


-1.02 


2 {Qm, A^) fixed 
2 (nm, A^,) free 


-0.94 


-0.90 


-0.87 


-0.88 


3 (fim, A^) fixed 


-0.7 


-0.71 


3 {Qm, A^) free 


-1.00 


-1.04 



y s^GP -In 1 + 5 ,^^2 / / ^ 1 
\ Jo Jo {l + u){l + u') J 

(21) 

where we choose a — 1. The mean value for this GP is 
simply obtained by integrating Eqn. (|20p for the mean 
value of the GP for w{u). We can now construct a joint 
GP for y{s) and w{u): 



y(s) 

w{u) 



GP 



ln(l + s) 

-1 







S12 






S21 


S22 





, (22) 
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TABLE IV: GP model - 95% Pis 



Set 




Am 


1 


0.27 





2 


0.27 





3 


0.27 






0.270 



+0.032 
-0.043 
3+0.046 



0.263lo.o5i 

n Q97+0.040 



-o.oos^n^^ 



n 07+0.06 
-0.05 

"•^'-0.05 
07+0 06 
-0.05 
n 07+0 06 

n 07+0 06 
U.a'-0.05 



0.915«:«f3 

0.802«i« 
n OQ7+0.100 

u.oai -0.266 

0.903««?^ 

n ec:9+0.143 
"■"'-'^-0.319 



n Qoo+0.339 

U.OOO_Q 

0.4061°;^?^ 
0.343l«:?- 

n Q4Q+0.394 
n +0.403 



with 



S22 



S12 — S2I — K 



+ 



(23) 
(24) 
(25) 



The mean for y{s) given w{u) can be found through the 
following relation: 

{y{s)\w{u)) = -ln(l + s) + Ei2S22i [w{u) - (-1)]. (26) 

Note that we never have to calculate the double inte- 
gral in Sii which would be numerically expensive. More 
details about each step in the GP model algorithm are 
given in Appendix El 

Following our practice for the parameterized recon- 
struction methods, we first apply the GP-based technique 
fixing the values for flm and A^. The results are shown 
in Figure[H] Reconstruction from the GP model for w(z) 
is remarkably accurate for all three data sets. (The noise 
in the predictions is due to the choice of the functional 
form of the covariance function.) In particular, the recon- 
struction for the third dataset, where the parameterized 
model did not fare well, is very good. Table HVl gives the 
results for the GP model hyperparameters p and k for 
all three models. Larger values for p indicate a smoother 
reconstructed function. For a model with more variation 
in the data and w crossing the mean several times, the 
correlation length would be smaller than for the models 
investigated here. Our analysis shows that the p is the 
smallest for the dataset that varies the most (dataset 3), 
as expected. Since even the third dataset is not vary- 
ing strongly, p is still close to one (note that p = 1 is 
not allowed). In addition, the interplay between p and n 
which determine the overall covariance function is non- 
trivial - the simultaneous fitting of k and p is therefore 
an important aspect of our approach. 

Last, we study the results from the GP model, letting 
rim and vary. The results are shown in Figure [TUl As 
for the two parameterized models, degeneracies degrade 
the results in the cases of varying w. For the case of 
w = —1 the prediction for is very close to the input 
value (see Table |IV] for the best-fit values for fl^ and 
and the GP model parameters), and the reconstruc- 
tion for w, albeit somewhat noisy, is very close to the 



truth. For the second model, the best-fit value for flm 
is slightly low, but correct within the error bars. The 
reconstruction is also in this case very accurate. For the 
third dataset the best fit value for $7™ is above the true 
value, leading to a lower result for w{z). The GP model 
approach captures the overall behavior of the true w{z) 
within the error bands. However, the problem due to the 
degeneracy between w and 57^ becomes very apparent 
in this case. The GP model approach finds a solution 
that overestimates Vim with a rather flat w. As we show 
in the next section, this is a good fit to the data but 
does not give a particular good match to the true w{z). 
These results will certainly improve if we have stronger 
constraints on fi™ from different datasets, e.g., CMB or 
baryon acoustic oscillation measurements to break the 
degeneracies between 0,^ and w{z). 



C. Comparison of the Different Approaches 

In order to summarize our findings, we now provide 
a brief comparison of the parametric and nonparametric 
reconstruction results. We consider two metrics for this 
comparison: (i) the accuracy of the reconstructed form 
of w{z) given that the exact answer is known, and (ii) 
the accuracy with which the predicted wlz) fits the data. 
As mentioned before, more involved statistical techniques 
such as BIG did not lead to satisfying results, as the sim- 
plest parametrization was always identified as the best - 
which is obviously not the case for datasets 2 and 3. 

Table |V] shows a simple measure of how well the exact 
functional form of w{z) has been captured by the three 
different approaches. We calculate the mean square error 
of the reconstructed history for w{z) with respect to the 
perfect input w{z) as shown in the lower panels of Fig. [5] 
- the smaller the error the better the reconstruction re- 
sult. This simple test has two minor shortcomings - first, 
it does not account for the realization noise in the history 
of w{z) underlying the simulated data so the error will 
never be zero. In order to obtain the true w{z) we would 
have to perform two derivatives of the noisy simulated 
data which would render this test basically meaningless. 
Second, the error bands of the predictions are not taken 
into account. Nevertheless, this comparison should pro- 
vide some information about how well the different meth- 
ods perform compared to each other. 



TABLE V: Mean squared error for the reconstructed 'w{z) 
w.r.t. the exact w{z). 





Dataset 1 


Dataset 2 Dataset 3 


Wo (fim, A^ fixed) 
Wo (fim, A^ free) 


0.00001 
0.00004 


0.0044 
0.0058 


0.1000 
0.3050 


Wo - Wa (rim, A^ fixed) 
Wo — Wa (fim, A^ free) 


0.00003 
0.0175 


0.0001 
0.0019 


0.0034 
0.0109 


GP (J7m, A^ fixed) 
GP (STim, Am free) 


0.0006 
0.0006 


0.0003 
0.0012 


0.0014 
0.1439 
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FIG. 11: Residuals for ^ considering the predictions for w 
for dataset 3 with flm and free. The left plot shows the 
result for the wo — Wa parametrization, the right plot shows 
the results for the GP approach. 



For the first dataset w — const, the parametric re- 
construction ansatz w — wq provides - not surprisingly 
- the best results; the history for w{z) is captured ex- 
tremely well as is il™ with small errors. For dataset 1, 
the GP model provides a more accurate answer than the 
Wo ~ Wa parameterization in the case of fi™ free. This 
is mainly due to the fact that once the parameterized 
form has picked up some curvature in w{z), the recon- 
structed w{z) will depart more and more from w = const 
at higher z. The GP model however is flexible enough 
to avoid such a behavior and stays close to w = — 1 over 
the whole redshift range. For dataset 2, the GP model 
performs slightly better than the two parametric recon- 
struction approaches for similar reasons as for dataset 
1. The wq — Wa parametrization picks up some time- 
dependence in the low-z regime which overestimates the 
curvature of w{z) at higher z while the GP approach re- 
constructs w{z) reasonably well over the whole redshift 
range and therefore has a smaller mean square error. For 
the third dataset, the mean square error for the GP model 
is smallest in the case of flm fixed but for flm free kept 
it is worse than the result from the wq — Wa parametriza- 
tion. For this last case (dataset 3 and and free) we 
employ another assessment of the accuracy of the predic- 
tion which highlights the well-known degeneracy between 
w and ilm,. We only consider the ~ Wa parametriza- 
tion and the GP model approach for this test, since the 
w = const, parametrization has obvious shortcomings in 
this case. 

For each case we fit the w{z) result and then we find 
the associated fit for /.t. Then we determine the difference 
between the predicted ^ and the input /i for our simulated 
data. We show the residuals in Figure [TTl The left figure 
shows the residuals for the wq — Wa parametrization, the 
right figure for the GP reconstruction. The solution for 
w{z) found with the GP model is clearly a good fit to the 
data - it performs slightly better than the parametrized 
form in the low redshift range. Due to the the degeneracy 
between w and VLm the overall reconstruction of w{z) is 



on the other hand worse for the GP model - this result 
will improve with tighter constraints on VLm and comple- 
mentary datasets such as BAO measurements which help 
to break the degeneracy. 



V. CONCLUSIONS 

Characterizing the behavior of the dark energy equa- 
tion of state is a first step in understanding the nature 
and origin of dark energy. Although a simple cosmo- 
logical constant model is consistent with current obser- 
vations, the implied numerical value has no theoretical 
explanation. Alternative dynamical models of dark en- 
ergy generically predict time variations in w and a robust 
detection of such a time dependence is one of the first 
targets in dark energy studies. Supernova measurements 
remain a very promising probe of w{z) and future sky 
surveys can in principle measure w{z) with high accu- 
racy. 

In order to fully exploit the power of future measure- 
ments, a reliable and robust reconstruction method is 
required. In this paper we have introduced a new recon- 
struction approach based on GP modeling. The approach 
is nonparametric with modeling hyperparameters con- 
strained directly from the data. We have demonstrated 
that we can extract nontrivial behavior of w as a function 
of redshift with data of the quality expected from future 
surveys. We have contrasted our new method against 
two approaches, an assumed cosmological constant, and 
one with a simple two-parameter model of the variation 
of w{z). Both of these models are effective descriptions 
for only a limited class of possible behaviors of w{z). In 
contrast, the generality of the GP approach results in ac- 
curate reconstruction of potentially complex variability 
in 'w{z). 

The GP model approach makes only mild smoothness 
assumptions about 'w{z) which are reasonable if we ex- 
pect that the accelerated expansion of the Universe is 
due to a physically well motivated reason. The major 
ingredient for the GP model is specified by the covari- 
ance function K{z, z'). While the choice of the specific 
form for K{z, z') is up to the modeler, the GP approach 
is rather robust to this choice and the major hyperpa- 
rameters infiuencing K{z, z') are informed by the data 
themselves. In addition to choosing a covariance func- 
tion, we have to specify a set of priors for cosmological 
and model parameters. These priors have to be broad 
enough to include the truth but should not be so broad 
that the Bayesian approach does not converge. Both 
model and cosmological parameters are then jointly de- 
termined from the data. While the Bayesian approach is 
computationaly rather intensive, it has the great advan- 
tage that it provides robust error bands. The approach 
outlined here for the analysis of supernova measurements 
can easily be extended to include difi^erent cosmological 
probes such as data from CMB and BAO observations; 
work in this direction is currently in progress. More- 
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over, the GP-based MCMC procedure can be integrated as a cosmology fitter, following the general methodology 
within supernova analysis frameworks, e.g., SNANA [53] presented in Ref. [34| |. 



Appendix A: GP model algorithm 

In this appendix we provide implementation details of the GP algorithm used to reconstruct w{z). The GP model 
approach requires the estimation of several variables: the correlation hyperparameters (p and k^), the Gaussian 
process points [w{u) with u = or rather y{s) with s = (si, ...s„j.^), the variance parameter (tr^), along 

with the physical parameters of interest (57„j and A^). il„j and can be added as extra steps; for simplicity we 
do not include them in the discussion here (we did include them in our analysis presented in the main body of the 
paper.) 

A Gaussian process is defined by its mean and correlation function. We set the prior mean of the Gaussian process 
to —1 for stability; other values are found to be equally acceptable when the true mean of w{z) is near —1. Even 
though the mean is fixed, the posterior mean will not be exactly —1 but will have a distribution spread around — 1. 
In the case when the true mean is not —1 the posterior value of the mean of w{z) will indicate this. Preliminary 
exploration of the posterior of w{z) can then be used to set the prior mean for subsequent runs, which provides more 
stable results. 

In the correlation function, the parameter a should be set as a constant beforehand in the range of 1 to 1.9999, 
setting it exactly equal to 2 can cause numerical instability in the covariance matrix, and values above 2 are not 
mathematically valid values for the process. The parameter a controls the smoothness of the overall GP: a < 2 will 
produce a flexible continuous GP but it will not be differentiable anywhere, while a = 2 produces a much smoother 
continuous GP that is infinitely differentiable everywhere. In order to allow for maximum flexibility we choose a = 1 
throughout the paper. The correlation length p is a free parameter in the GP model and its value is informed by the 
data. It is highly correlated to a and which makes the interpretation of its final value not straightforward. It is 
strictly limited to the region of [0,1) and the GP is not defined for the limiting case of p — 1. After investigation we 
found that p has a much smaller role in the determination of the nature of the GP in comparison to a and . 

Integration of the Gaussian process requires a grid for numerical integration. Let n be the number of supernova 
data points in our dataset, and m be a finite number of Gaussian process points over this region for evaluation. 
During the integration we add h — 1 partition points between each GP point. Our resulting integrated process y{s) 
has m • h points. This provides a dense enough grid to carry out an accurate numerical integration for the outer 
integral without slowing down the computations. We find that an m around 50 to 100 is sufficient, and an h between 
3 and 5 is a good balance between accuracy and speed. 

As with the parametric models, we employ Bayesian methods where we use our priors and likelihood to obtain a 
posterior that can then be sampled with an MCMC algorithm. In our case, with our given likelihood function (given 
in Eqn. ([H])), this leads to the following posterior for the parameters of interest: 

a^,p, K^lfi, T^,z(x L{z, fi, T\yiv),a^)MVN{y{v)\p, K^)TT{p)Tr{K^)TT{a^). (Al) 

y(v) here denotes an arbitrary GP with parameters and p. is the unknown variance parameter from the 
likelihood equation and z, /i, and r are "observed" values from the simulated dataset. 

Instead of the usual Gaussian process formulation shown in Eqn. (lAip , we choose an altered form to allow for slower 
changes in the GP and a more localized search. We let y{v) - MyiV(-l,S) and Y.-^l'^{y{v) - (-1)) = y°{v) - 
MT^A^(0, /). This leads to the following posterior: 

L(z, p, T\y°(v),p, K^,a^)MVN{y°{v)- 0, I)^{p)i:{k')ti{<j''). (A2) 

In this setup, we propose a GP y°(v) and transform it to y[v). We then keep track of y°(v) and make our next 
proposal based on these values and not on the y(v). Thus we are allowing the proposal to make finer changes each 
time to boost the acceptance rate, which tends to be problematic if we were to propose y[v) directly. 

The explicit procedure for estimating the process parameters is as follows and employs standard MCMC techniques. 

1. Initialize all variables: p = pi, = k\, and w°(u) = w(u) will be a vector with m points in our GP 

and y(^s) has m • h points. We run this algorithm g = 1, ...,(5 times. Set all tuning parameters, i5i^2.3, which 
need to be tuned until good mixing occurs. 



2. Propose p* = U{pq-\ - (5i,Pq-i + h) 
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(a) Compute the covariance matrix K22* = 

(b) Compute the Cholesky decomposition for K22* — U'JJf, 

(c) Compute the special Ki2f = JJ* (iu/3*l"^''l°/(l + u) with Chebyshev-Gauss quadrature. 

(d) We want yp.(s) = - ln(l+s) + [K2_^ii:i2,] [K^_iii:2Vj(wp. (u)- (-1)) where Wp. (m) = [Kg_iC/^]<„ + 



yp.{s) = - ln(l + ,s) + [kI_,K,2A[^1_,K22A-'{{^,-iK.w'^^,^^ + (-1)) - (-1)) 

= - in(i + ,s) + «,_ii^i2* [([/;. t/p.)-'f/;.]<,-i 

= - ln(l + S) + Kq-lKl2*[Up}]w°-i^ ,j_^ 

(e) L{z, ^,T\yp. ,(T^_^) — exp ^— "^^^'^^""^"^^ ^ ^ where the definite integrations in T{zi,yp*{u)) are 
done numerically through summations of the trapezoid algorithm. 

(f) Accept the proposal p* with probability ^ ^''*^|p l^t Pq = p* , otherwise pq = Pq^\. 

3. Draw n^* = U{k\_^ - S2, i + 

(a) Compute y^2, (s) = (-1) ln(l + s) + Ki2g[Uq\]w°^^q_i 



(b) L{z, p., T\y^2, , cr^_i) — exp ^— 5 ^"^^ ^ ^ where the definite integrations in T(zi, 2/^2* {u)) 

done numerically through summations of the trapezoid algorithm. 



are 



(c) Accept with probability ^^^"^''^[k^ ^ ) ■ 



4. We propose a non-standard for the GP. We start by drawing a proposal for w"* ^ MVN{w°_i,S3l„ 
where Imxm is the identity matrix. 



(a) Compute y*{s) ^ (-1) ln(l + s) + KqKi2q[U-^]w°* 

(b) i,,^,,|,.(,),,2_^ 2^ 

(c) Accept with probability , ^""mI/ot!! ^°' | o/i letting yg(s) = J/*(s) and the corresponding GP realization is 

Wm,q{u) = W*^{u) 

5. a,2|...^/G(f + 10,iE(^^^^)'+9) 

6. Repeat steps 2-6, Q times and rerun the entire algorithm as needed after resetting the tuning parameters 

I 
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